Skip to content

feat(remote): Add checksum path filters for download and extract tasks (fixes #68).#113

Merged
Bill-hbrhbr merged 4 commits intoy-scope:mainfrom
Bill-hbrhbr:add-checksum-path-patterns-for-download-and-extract
Apr 29, 2026
Merged

feat(remote): Add checksum path filters for download and extract tasks (fixes #68).#113
Bill-hbrhbr merged 4 commits intoy-scope:mainfrom
Bill-hbrhbr:add-checksum-path-patterns-for-download-and-extract

Conversation

@Bill-hbrhbr
Copy link
Copy Markdown
Contributor

@Bill-hbrhbr Bill-hbrhbr commented Apr 23, 2026

Description

(closes #69 )

For npm packages, sometimes we only want to checksum a subset of the downloaded source, or ignore
in-place updates in known locations such as node_modules. node_modules should live inside the source tree because package managers like npm and Yarn install relative to the project root, and many tools assume this layout.

However, these updates can cause checksum drift. Adding CHECKSUM_EXCLUDE_PATTERNS to download and extract tasks is the most direct and practical way to handle this. Beyond node_modules, it can filter out build outputs, caches, logs, generated docs, and other nonessential files that do not affect the core content of the source tree.

Alongside this, we introduce CHECKSUM_INCLUDE_PATTERNS for completeness, but it is more narrow in scope. It is useful when only a small, well defined portion of a larger directory is the actual artifact of interest and the rest is generated. Currently, it defaults to the entire extraction output directory, preserving the original behavior before this PR.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Pass the newly added unit tests.

Summary by CodeRabbit

  • New Features

    • Tar and zip extraction tasks now accept configurable checksum include/exclude patterns to control which files are considered during checksum validation and computation, with sensible defaults.
  • Tests

    • Added tests verifying exclude/include scopes and that checksum-scoped runs correctly skip work when only out-of-scope files change.
  • Documentation

    • Updated parameter docs to clarify glob-style matching against full paths for checksum include/exclude patterns.

@Bill-hbrhbr Bill-hbrhbr requested a review from a team as a code owner April 23, 2026 17:16
@Bill-hbrhbr Bill-hbrhbr requested a review from junhaoliao April 23, 2026 17:16
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

Walkthrough

The tar and zip extraction tasks now accept CHECKSUM_INCLUDE_PATTERNS and CHECKSUM_EXCLUDE_PATTERNS. Checksum validation and computation were updated to use those patterns (derived to INCLUDE_PATTERNS / EXCLUDE_PATTERNS). New tests exercise checksum exclude/include behaviours for zip extraction.

Changes

Cohort / File(s) Summary
Remote task inputs
exports/taskfiles/utils/remote.yaml
Added public inputs CHECKSUM_INCLUDE_PATTERNS and CHECKSUM_EXCLUDE_PATTERNS for download-and-extract-tar and download-and-extract-zip; checksum compute/validate now pass derived INCLUDE_PATTERNS/EXCLUDE_PATTERNS instead of always using {{.OUTPUT_DIR}}.
Checksum tests
taskfiles/remote/tests.yaml
Added two public test tasks: download-and-extract-zip-test-checksum-exclude (compares checksums when excluding paths via checksum vs extraction) and download-and-extract-zip-test-checksum-include (verifies include-scoped checksum prevents rework when out-of-scope files change).
Checksum docs/comments
exports/taskfiles/utils/checksum.yaml
Updated JSDoc-style parameter comments for EXCLUDE_PATTERNS in compute and validate tasks to clarify they are glob matches against full paths rooted at INCLUDE_PATTERNS (documentation-only change).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant TaskRunner
    participant Downloader
    participant Extractor
    participant Checksum
    Client->>TaskRunner: run download-and-extract-(zip|tar) with CHECKSUM_*_PATTERNS
    TaskRunner->>Downloader: fetch archive
    Downloader-->>TaskRunner: archive
    TaskRunner->>Extractor: extract archive (pass EXCLUDE_PATTERNS)
    Extractor-->>TaskRunner: extracted files
    TaskRunner->>Checksum: compute/validate (pass CHECKSUM_INCLUDE_PATTERNS, CHECKSUM_EXCLUDE_PATTERNS)
    Checksum-->>TaskRunner: checksum result / skip decision
    TaskRunner-->>Client: task result (skipped or updated)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding checksum path filters (CHECKSUM_EXCLUDE_PATTERNS and CHECKSUM_INCLUDE_PATTERNS) to remote download and extract tasks.
Linked Issues check ✅ Passed The PR satisfies all coding requirements from issue #69: implements CHECKSUM_EXCLUDE_PATTERNS and CHECKSUM_INCLUDE_PATTERNS inputs for download-and-extract tasks, adds comprehensive tests validating exclude and include pattern behavior, and maintains backward compatibility with default inclusion of entire output directory.
Out of Scope Changes check ✅ Passed All changes directly support the PR objectives: task input additions, test cases for the new features, and documentation-only updates to checksum parameter comments clarifying path semantics.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's refer to #69 and add tests to cover those parameters

the title should include (resolves #68).

Copy link
Copy Markdown
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting for tests to be ported over then this is good to go

@Bill-hbrhbr Bill-hbrhbr changed the title feat(remote): Add checksum path filters for download and extract tasks. feat(remote): Add checksum path filters for download and extract tasks (fixes #68). Apr 28, 2026
@Bill-hbrhbr Bill-hbrhbr linked an issue Apr 28, 2026 that may be closed by this pull request
@Bill-hbrhbr Bill-hbrhbr requested a review from junhaoliao April 28, 2026 12:38
Copy link
Copy Markdown
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rest lgtm. just docstring issues

Comment thread exports/taskfiles/utils/remote.yaml Outdated
Comment on lines +63 to +64
# @param {string[]} [CHECKSUM_EXCLUDE_PATTERNS=[]] Path wildcard patterns, relative to any
# `CHECKSUM_INCLUDE_PATTERNS`, to exclude from the checksum.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "relative to any CHECKSUM_INCLUDE_PATTERNS" wording seems ambiguous. since whether CHECKSUM_INCLUDE_PATTERNS is specified or not, those patterns will be excluded, i think it's fine to remove the wording to avoid confusion?

Suggested change
# @param {string[]} [CHECKSUM_EXCLUDE_PATTERNS=[]] Path wildcard patterns, relative to any
# `CHECKSUM_INCLUDE_PATTERNS`, to exclude from the checksum.
# @param {string[]} [CHECKSUM_EXCLUDE_PATTERNS=[]] Path wildcard patterns to exclude from the
# checksum computation.

Copy link
Copy Markdown
Contributor Author

@Bill-hbrhbr Bill-hbrhbr Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should follow the EXCLUDE_PATTERNS docstring for checksum:compute and checksum:validate.

How about:

# @param {string[]} [CHECKSUM_EXCLUDE_PATTERNS=[]] Path wildcard patterns, relative to any `CHECKSUM_INCLUDE_PATTERNS` (or OUTPUT_DIR by default), to exclude from the checksum.

Copy link
Copy Markdown
Member

@junhaoliao junhaoliao Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about

# @param {string[]} [CHECKSUM_EXCLUDE_PATTERNS=[]] Path wildcard patterns to omit from the
#   checksum. Each pattern applies to all paths selected for checksum computation,
#   whether the paths are from `INCLUDE_PATTERNS` or `OUTPUT_DIR`.

cc @davidlion

Comment thread exports/taskfiles/utils/remote.yaml Outdated
Comment on lines +152 to +153
# @param {string[]} [CHECKSUM_EXCLUDE_PATTERNS=[]] Path wildcard patterns, relative to any
# `CHECKSUM_INCLUDE_PATTERNS`, to exclude from the checksum.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# @param {string[]} [CHECKSUM_EXCLUDE_PATTERNS=[]] Path wildcard patterns, relative to any
# `CHECKSUM_INCLUDE_PATTERNS`, to exclude from the checksum.
# @param {string[]} [CHECKSUM_EXCLUDE_PATTERNS=[]] Path wildcard patterns to exclude from the
# checksum computation.

Comment thread taskfiles/remote/tests.yaml Outdated
Comment thread taskfiles/remote/tests.yaml Outdated
Co-authored-by: Junhao Liao <junhao@junhao.ca>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@taskfiles/remote/tests.yaml`:
- Around line 163-165: CHECKSUM_EXCLUDE_PATTERNS currently contains hardcoded
fixture paths (".github/CODEOWNERS", ".github/PULL_REQUEST_TEMPLATE.md");
replace these literal strings with the shared extracted-path variables already
defined in the repo (use the existing extracted-path vars rather than
hardcoding) so the patterns reference those variables in
taskfiles/remote/tests.yaml (update the CHECKSUM_EXCLUDE_PATTERNS array to use
the shared vars and ensure those vars are imported/available in the same scope).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 869e3d2d-c4ac-4862-8ea2-08ec52815e48

📥 Commits

Reviewing files that changed from the base of the PR and between 9dca0ae and fe47052.

📒 Files selected for processing (1)
  • taskfiles/remote/tests.yaml

Comment thread taskfiles/remote/tests.yaml
@Bill-hbrhbr Bill-hbrhbr requested a review from junhaoliao April 29, 2026 20:38
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
exports/taskfiles/utils/checksum.yaml (1)

13-14: ⚠️ Potential issue | 🟡 Minor

Clarify EXCLUDE_PATTERNS semantics to avoid implying root-anchored matches.

Line 13 and Line 78 currently describe excludes as matching “full paths rooted at INCLUDE_PATTERNS”, but matching is broader in practice (non-anchored wildcard matching under the include set). This wording can cause incorrect pattern authoring.

Suggested doc wording
-  # `@param` {string[]} [EXCLUDE_PATTERNS=[]] Path wildcard patterns to exclude from checksum
-  # computation, applied as glob-style matches to full paths rooted at the `INCLUDE_PATTERNS`.
+  # `@param` {string[]} [EXCLUDE_PATTERNS=[]] Path wildcard patterns to exclude from checksum
+  # computation, matched against paths within the `INCLUDE_PATTERNS` scope.

Based on learnings: CHECKSUM_EXCLUDE_PATTERNS values are expected relative to the extracted content root (for example, .github/CODEOWNERS), without the top-level extracted directory prefix.

Also applies to: 78-79

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@exports/taskfiles/utils/checksum.yaml` around lines 13 - 14, Update the
documentation wording for EXCLUDE_PATTERNS (and CHECKSUM_EXCLUDE_PATTERNS) to
state that exclude patterns are non-anchored glob patterns applied relative to
the extracted content root (e.g., ".github/CODEOWNERS") and match anywhere under
the included set, not as full paths rooted at the INCLUDE_PATTERNS prefix;
replace the phrase “full paths rooted at the `INCLUDE_PATTERNS`” with a
clarification that patterns are evaluated against paths relative to the
extracted content root and may match any descendant path under the included
patterns.
exports/taskfiles/utils/remote.yaml (1)

63-67: ⚠️ Potential issue | 🟡 Minor

Mirror the same docstring clarification here for checksum exclude patterns.

Line 63 and Line 153 repeat the same “full paths rooted at CHECKSUM_INCLUDE_PATTERNS” phrasing, which can mislead callers about how to author excludes.

Based on learnings: in remote tests, CHECKSUM_EXCLUDE_PATTERNS entries are relative to extracted content root (for example, .github/CODEOWNERS) and are not interchangeable with prefixed extracted-directory paths.

Also applies to: 153-157

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@exports/taskfiles/utils/remote.yaml` around lines 63 - 67, Update the
docstring for CHECKSUM_EXCLUDE_PATTERNS to mirror the clarification added for
includes: state that exclude patterns are matched against paths relative to the
extracted content root (e.g., ".github/CODEOWNERS") and are not prefixed by the
extracted output directory or interchangeable with CHECKSUM_INCLUDE_PATTERNS
roots; update both occurrences that currently say “full paths rooted at the
`CHECKSUM_INCLUDE_PATTERNS`” to this clarified wording and keep the existing
example of CHECKSUM_INCLUDE_PATTERNS ({{.OUTPUT_DIR}}) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@exports/taskfiles/utils/checksum.yaml`:
- Around line 13-14: Update the documentation wording for EXCLUDE_PATTERNS (and
CHECKSUM_EXCLUDE_PATTERNS) to state that exclude patterns are non-anchored glob
patterns applied relative to the extracted content root (e.g.,
".github/CODEOWNERS") and match anywhere under the included set, not as full
paths rooted at the INCLUDE_PATTERNS prefix; replace the phrase “full paths
rooted at the `INCLUDE_PATTERNS`” with a clarification that patterns are
evaluated against paths relative to the extracted content root and may match any
descendant path under the included patterns.

In `@exports/taskfiles/utils/remote.yaml`:
- Around line 63-67: Update the docstring for CHECKSUM_EXCLUDE_PATTERNS to
mirror the clarification added for includes: state that exclude patterns are
matched against paths relative to the extracted content root (e.g.,
".github/CODEOWNERS") and are not prefixed by the extracted output directory or
interchangeable with CHECKSUM_INCLUDE_PATTERNS roots; update both occurrences
that currently say “full paths rooted at the `CHECKSUM_INCLUDE_PATTERNS`” to
this clarified wording and keep the existing example of
CHECKSUM_INCLUDE_PATTERNS ({{.OUTPUT_DIR}}) unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6340ee8c-487f-4734-a2fa-40087fe422c1

📥 Commits

Reviewing files that changed from the base of the PR and between fe47052 and 4e124ae.

📒 Files selected for processing (2)
  • exports/taskfiles/utils/checksum.yaml
  • exports/taskfiles/utils/remote.yaml

@Bill-hbrhbr Bill-hbrhbr merged commit 2facad8 into y-scope:main Apr 29, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remote utils should pass EXCLUDE_PATTERNS to the checksum tasks.

2 participants